A tagger/lemmatiser for Dutch medical language

نویسنده

  • Peter Spyns
چکیده

In this paper, we want to describe a tag-ger/lemmatiser for Dutch medical vocabulary , which consists of a full-form dictionary and a morphological recogniser for unknown vocabulary coupled to an expert system-like disambiguation module. Attention is also paid to the main datastructures: a lexical database and feature bundles implemented as directed acyclic graphs. Some evaluation results are presented as well. The tag-ger/lemmatiser currently functions as a lexical front-end for a syntactic parser. For pure tagging/lemmatising purposes, a reduced tagset (not suited for sentence analysis) can be used as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EUSLEM: A lemmatiser/tagger for Basque

This paper presents relevant issues that have been considered in the design and development of a general purpose lemmatiser/tagger for Basque (EUSLEM). The lemmatiser/tagger is conceived as a basic tool for other linguistic applications. It uses the lexical database and the morphological analyser previously developed and implemented. We will descr ibe the components used in the development of t...

متن کامل

Open Source Corpus Analysis Tools for Malay

Tokenisers, lemmatisers and POS taggers are vital to the linguistic and digital furtherment of any language. In this paper, we present an open source toolkit for Malay incorporating a word and sentence tokeniser, a lemmatiser and a partial POS tagger, based on heavy reuse of pre-existing language resources. We outline the software architecture of each component, and present an evaluation of eac...

متن کامل

Developing tools and resources for the biomedical domain of the Greek language

This paper presents the design and implementation of terminological and specialized textual resources that were produced in the framework of the Greek research project "IATROLEXI". The aim of the project was to create the critical infrastructure for the Greek language, i.e. linguistic resources and tools for use in high level Natural Language Processing (NLP) applications in the domain of biome...

متن کامل

Developing Text Resources for Ten South African Languages

The development of linguistic resources for use in natural language processing is of utmost importance for the continued growth of research and development in the field, especially for resource-scarce languages. In this paper we describe the process and challenges of simultaneously developing multiple linguistic resources for ten of the official languages of South Africa. The project focussed o...

متن کامل

Transferring PoS-tagging and lemmatization tools from spoken to written Dutch corpus development

Abstract We describe a case study in the reuse and transfer of tools in language resource development, from a corpus of spoken Dutch to a corpus of written Dutch. Once tools for a particular language have been developed, it is logical, but not trivial to reuse them for other types or registers of the language than the tools were originally designed for. This paper reviews the decisions and adap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996